AITopics | target knowledge

Collaborating Authors

target knowledge

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Wisdom is Knowing What not to Say Hallucination Free LLMs Unlearning via Attention Shifting

Neural Information Processing SystemsJun-19-2026, 17:13:35 GMT

The increase in computing power and the necessity of AI-assisted decision-making boost the growing application of Large Language Models (LLMs). Along with this, the potential retention of sensitive data of LLMs has spurred increasing research into machine unlearning. However, existing unlearning approaches face a critical dilemma: Aggressive unlearning compromises model utility, while conservative strategies preserve utility but risk hallucinated responses. This significantly limits LLMs' reliability in knowledge-intensive applications. To address this, we introduce a novel Attention-Shifting (AS) framework for selective unlearning.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
Europe > Belgium (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

Tan, Chenchen, Qu, Youyang, Li, Xinghao, Zhang, Hui, Cui, Shujie, Chen, Cunjian, Gao, Longxiang

arXiv.org Artificial IntelligenceNov-4-2025

The increase in computing power and the necessity of AI-assisted decision-making boost the growing application of large language models (LLMs). Along with this, the potential retention of sensitive data of LLMs has spurred increasing research into machine unlearning. However, existing unlearning approaches face a critical dilemma: Aggressive unlearning compromises model utility, while conservative strategies preserve utility but risk hallucinated responses. This significantly limits LLMs' reliability in knowledge-intensive applications. To address this, we introduce a novel Attention-Shifting (AS) framework for selective unlearning. AS is driven by two design objectives: (1) context-preserving suppression that attenuates attention to fact-bearing tokens without disrupting LLMs' linguistic structure; and (2) hallucination-resistant response shaping that discourages fabricated completions when queried about unlearning content. AS realizes these objectives through two attention-level interventions, which are importance-aware suppression applied to the unlearning set to reduce reliance on memorized knowledge and attention-guided retention enhancement that reinforces attention toward semantically essential tokens in the retained dataset to mitigate unintended degradation. These two components are jointly optimized via a dual-loss objective, which forms a soft boundary that localizes unlearning while preserving unrelated knowledge under representation superposition. Experimental results show that AS improves performance preservation over the state-of-the-art unlearning methods, achieving up to 15% higher accuracy on the ToFU benchmark and 10% on the TDEC benchmark, while maintaining competitive hallucination-free unlearning effectiveness. Compared to existing methods, AS demonstrates a superior balance between unlearning effectiveness, generalization, and response reliability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.1721

Country:

Asia > China (0.28)
Europe > Belgium (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Understanding the Dilemma of Unlearning for Large Language Models

Zhang, Qingjie, Qian, Haoting, Huang, Zhicong, Hong, Cheng, Huang, Minlie, Xu, Ke, Zhang, Chao, Qiu, Han

arXiv.org Artificial IntelligenceSep-30-2025

Unlearning seeks to remove specific knowledge from large language models (LLMs), but its effectiveness remains contested. On one side, "forgotten" knowledge can often be recovered through interventions such as light fine-tuning; on the other side, unlearning may induce catastrophic forgetting that degrades general capabilities. Despite active exploration of unlearning methods, interpretability analyses of the mechanism are scarce due to the difficulty of tracing knowledge in LLMs' complex architectures. We address this gap by proposing unPact, an interpretable framework for unlearning via prompt attribution and contribution tracking. Typically, it quantifies each prompt token's influence on outputs, enabling pre- and post-unlearning comparisons to reveal what changes. Across six mainstream unlearning methods, three LLMs, and three benchmarks, we find that: (1) Unlearning appears to be effective by disrupting focus on keywords in prompt; (2) Much of the knowledge is not truly erased and can be recovered by simply emphasizing these keywords in prompts, without modifying the model's weights; (3) Catastrophic forgetting arises from indiscriminate penalization of all tokens. Taken together, our results suggest an unlearning dilemma: existing methods tend either to be insufficient - knowledge remains recoverable by keyword emphasis, or overly destructive - general performance collapses due to catastrophic forgetting, still leaving a gap to reliable unlearning.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.24675

Country: Europe (0.67)

Genre: Research Report > New Finding (0.69)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

Yang, Nakyeong, Kim, Dong-Kyum, Kwon, Jea, Kim, Minsung, Jung, Kyomin, Cha, Meeyoung

arXiv.org Artificial IntelligenceSep-29-2025

Large language models trained on web-scale data can memorize private or sensitive knowledge, raising significant privacy risks. Although some unlearning methods mitigate these risks, they remain vulnerable to "relearning" during subsequent training, allowing a substantial portion of forgotten knowledge to resurface. In this paper, we show that widely used unlearning methods cause shallow alignment: instead of faithfully erasing target knowledge, they generate spurious unlearning neurons that amplify negative influence to hide it. Experimental results confirm that our method reliably erases target knowledge and outperforms strong baselines across two practical retraining scenarios: (1) adversarial injection of private data, and (2) benign attack using an instruction-following benchmark. Our findings highlight the necessity of robust and faithful unlearning methods for safe deployment of language models. Large language models (LLMs) are built on vast corpora of web-scale data, equipping them with broad capabilities across diverse tasks. Y et, this scale introduces privacy risks, as training datasets may inadvertently contain sensitive or personally identifiable information. In response, prior works have explored strategies to remove private or sensitive knowledge from LLMs. Such approaches include gradient-based interventions (Jang et al., 2022; Maini et al., 2024), preference-driven optimization frameworks (Jin et al., 2024; Y ang et al., 2025), and representation learning techniques (Li et al., 2024), each of which aims to mitigate privacy risks embedded in model parameters. Despite these efforts, prior studies reveal that existing unlearning techniques often fail to robustly eliminate target knowledge. Models subjected to such interventions remain susceptible to prompt-based elicitation (Jin et al., 2024; Y ang et al., 2025) and can inadvertently recover forgotten information through representational shifts introduced by subsequent training (Deeb & Roger, 2024; Hu et al., 2024).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.22263

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models

Lee, Hwiyeong, Hwang, Uiji, Lim, Hyelim, Kim, Taeuk

arXiv.org Artificial IntelligenceSep-18-2025

Large language models often retain unintended content, prompting growing interest in knowledge unlearning. Recent approaches emphasize localized unlearning, restricting parameter updates to specific regions in an effort to remove target knowledge while preserving unrelated general knowledge. However, their effectiveness remains uncertain due to the lack of robust and thorough evaluation of the trade-off between the competing goals of unlearning. In this paper, we begin by revisiting existing localized unlearning approaches. We then conduct controlled experiments to rigorously evaluate whether local parameter updates causally contribute to unlearning. Our findings reveal that the set of parameters that must be modified for effective unlearning is not strictly determined, challenging the core assumption of localized unlearning that parameter locality is inherently indicative of effective knowledge removal.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.16252

Country:

Asia > Middle East (0.46)
North America > Mexico (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering

Sato, Yugen, Takagi, Tomohiro

arXiv.org Artificial IntelligenceMar-28-2025

Recent advances in large language models (LLMs) have led to the development of multimodal LLMs (MLLMs) in the fields of natural language processing (NLP) and computer vision. Although these models allow for integrated visual and language understanding, they present challenges such as opaque internal processing and the generation of hallucinations and misinformation. Therefore, there is a need for a method to clarify the location of knowledge in MLLMs. In this study, we propose a method to identify neurons associated with specific knowledge using MiniGPT-4, a Transformer-based MLLM. Specifically, we extract knowledge neurons through two stages: activation differences filtering using inpainting and gradient-based filtering using GradCAM. Experiments on the image caption generation task using the MS COCO 2017 dataset, BLEU, ROUGE, and BERTScore quantitative evaluation, and qualitative evaluation using an activation heatmap showed that our method is able to locate knowledge with higher accuracy than existing methods. This study contributes to the visualization and explainability of knowledge in MLLMs and shows the potential for future knowledge editing and control.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.22941

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(3 more...)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets

Wang, Wenyu, Zhang, Mengqi, Ye, Xiaotian, Ren, Zhaochun, Chen, Zhumin, Ren, Pengjie

arXiv.org Artificial IntelligenceMar-6-2025

Large Language Models (LLMs) inevitably acquire harmful information during training on massive datasets. LLM unlearning aims to eliminate the influence of such harmful information while maintaining the model's overall performance. Existing unlearning methods, represented by gradient ascent-based approaches, primarily focus on forgetting target data while overlooking the crucial impact of logically related knowledge on the effectiveness of unlearning. In this paper, through both theoretical and experimental analyses, we first demonstrate that a key reason for the suboptimal unlearning performance is that models can reconstruct the target content through reasoning with logically related knowledge. To address this issue, we propose Unlearning Improvement via Parameter Extrapolation (UIPE), a method that removes knowledge highly correlated with the forgetting targets. Experimental results show that UIPE significantly enhances the performance of various mainstream LLM unlearning methods on the TOFU benchmark.

information, knowledge, target forget, (15 more...)

arXiv.org Artificial Intelligence

2503.04693

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
Asia > China > Beijing > Beijing (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Why pre-training is beneficial for downstream classification tasks?

Jiang, Xin, Cheng, Xu, Li, Zechao

arXiv.org Artificial IntelligenceOct-10-2024

Pre-training has exhibited notable benefits to downstream tasks by boosting accuracy and speeding up convergence, but the exact reasons for these benefits still remain unclear. To this end, we propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view, which also sheds new light into the learning behavior of deep neural networks (DNNs). Specifically, we extract and quantify the knowledge encoded by the pre-trained model, and further track the changes of such knowledge during the fine-tuning process. Interestingly, we discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks. However, such preserved knowledge is very challenging for a model training from scratch to learn. Thus, with the help of this exclusively learned and useful knowledge, the model fine-tuned from pre-training usually achieves better performance than the model training from scratch. Besides, we discover that pre-training can guide the fine-tuned model to learn target knowledge for the downstream task more directly and quickly, which accounts for the faster convergence of the fine-tuned model.

artificial intelligence, knowledge, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.08455

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Learn To Learn More Precisely

Cheng, Runxi, Wei, Yongxian, He, Xianglong, Zhu, Wanyun, Huang, Songsong, Yu, Fei Richard, Ma, Fei, Yuan, Chun

arXiv.org Artificial IntelligenceAug-8-2024

Meta-learning has been extensively applied in the domains of few-shot learning and fast adaptation, achieving remarkable performance. While Meta-learning methods like Model-Agnostic Meta-Learning (MAML) and its variants provide a good set of initial parameters for the model, the model still tends to learn shortcut features, which leads to poor generalization. In this paper, we propose the formal conception of "learn to learn more precisely", which aims to make the model learn precise target knowledge from data and reduce the effect of noisy knowledge, such as background and noise. To achieve this target, we proposed a simple and effective meta-learning framework named Meta Self-Distillation(MSD) to maximize the consistency of learned knowledge, enhancing the models' ability to learn precise target knowledge. In the inner loop, MSD uses different augmented views of the same support data to update the model respectively. Then in the outer loop, MSD utilizes the same query data to optimize the consistency of learned knowledge, enhancing the model's ability to learn more precisely. Our experiment demonstrates that MSD exhibits remarkable performance in few-shot classification tasks in both standard and augmented scenarios, effectively boosting the accuracy and consistency of knowledge learned by the model.

consistency, knowledge, learning, (12 more...)

arXiv.org Artificial Intelligence

2408.0459

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning

Wang, Renzhi, Li, Piji

arXiv.org Artificial IntelligenceMay-28-2024

Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of Large Language Models (LLMs) to various downstream applications. However, the effectiveness of the PEFT diminishes notably when downstream tasks require accurate learning of factual knowledge. In this paper, we adopt a semantic perspective to investigate this phenomenon, uncovering the reasons behind PEFT's limitations in knowledge learning task. Our findings reveal that: (1) PEFT presents a notable risk of pushing the model away from the intended knowledge target; (2) multiple knowledge interfere with each other, and such interference suppresses the learning and expression of knowledge features. Based on these insights, we introduce a data filtering strategy to exclude data that is detrimental to knowledge learning and a re-weighted learning strategy to make the model attentive to semantic distance during knowledge learning. Experimental results demonstrate the effectiveness of the proposed method on open-source large language model, further validate the semantic challenge in PEFT, thus paving the way for future research.

knowledge, semantic distance, target knowledge, (14 more...)

arXiv.org Artificial Intelligence

2405.18292

Country:

Asia > Singapore (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)
(15 more...)

Genre: Research Report > New Finding (0.86)

Industry: Government > Regional Government > North America Government > United States Government (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback